kNNVWC: An Efficient k-Nearest Neighbors Approach Based on Various-Widths Clustering

نویسندگان

  • Abdulmohsen Almalawi
  • Adil Fahad
  • Zahir Tari
  • Muhammad Aamir Cheema
  • Ibrahim Khalil
چکیده

The approximate k-NN search algorithms are well-known for their high concert in high dimensional data. The locality-sensitive hashing (LSH) method, that uses a number of hash functions, is one of the most fascinating hash-based approaches. The k-nearest neighbour approaches based Various-Widths Clustering (kNNVWC) has been widely used as a prevailing non-parametric technique in many scientific and engineering applications. However, this approach incurs a huge pre-processing and the querying cost. Hence, this issue has become an active explore field. The proposed system presents a novel k-NN based Partitioning Around Medoids (KNNPAM) clustering algorithm to powerfully find k-NNs for a query object from a given data set to minimize the extend beyond among clusters; and grouping the centers of the clusters into a tree-like index to effectively trim more clusters. Experimental results demonstrate that KNNPAM perform well in finding k-NNs for query objects compared to a number of k-NN search algorithms, mainly for a banking domain and real world data set with high dimensions, various distributions and large size. The problem of quickly finding the “exact” k-NN for a query object in a large and high dimensional data set using metric reserve functions that satisfy the triangle inequality property.KD-tree: To organization the data set in a balanced binary-tree, where the data set is recursively split into two parts along one axis .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

Sorted Nearest Neighborhood Clustering for Efficient Private Blocking

Record linkage is an emerging research area which is required by various real-world applications to identify which records in different data sources refer to the same real-world entities. Often privacy concerns and restrictions prevent the use of traditional record linkage applications across different organizations. Linking records in situations where no private or confidential information can...

متن کامل

Adaptive Nearest Neighbor Classifier Based on Supervised Ellipsoid Clustering

Nearest neighbor classifier is a widely-used effective method for multi-class problems. However, it suffers from the problem of the curse of dimensionality in high dimensional space. To solve this problem, many adaptive nearest neighbor classifiers were proposed. In this paper, a locally adaptive nearest neighbor classification method based on supervised learning style which works well for the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2016